Skip to content

Add qwen3 model#1485

Open
pstjohn wants to merge 4 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/qwen3
Open

Add qwen3 model#1485
pstjohn wants to merge 4 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/qwen3

Conversation

@pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Feb 28, 2026

This PR includes changes from #1486, if we merge that the diff here will just be in qwen3.

Includes a temporary fix for NVIDIA/TransformerEngine#2718, which we can remove when that is merged and in the base image.

This adds the Qwen3 model (https://huggingface.co/Qwen/Qwen3-0.6B), specifically the dense variant, although MoE would be fairly easy to add with our Mixtral model recipe.

Key differences of Qwen3 vs. Llama3 --

  • qk_norm layers, using TE's qk_norm_type and qk_norm_before_rope kwargs
  • sliding window attention (SWA), using the window_size kwarg on certain layers:
window_size=(config.sliding_window, config.sliding_window) 
  if config.layer_types[layer_idx] == "sliding_attention" else None

Summary by CodeRabbit

  • New Features

    • Added support for Qwen2.5 and Qwen3 models with TransformerEngine optimizations including FP8, MXFP8, sequence packing, and KV-cache inference
    • Added bidirectional model conversion utilities between HuggingFace and optimized formats
    • Added inference examples and checkpoint export functionality for Qwen3
  • Documentation

    • Added comprehensive README covering feature support, inference workflows, model conversion, and validation guidance
    • Added developer guide with testing and export instructions
  • Tests

    • Added comprehensive test framework with golden-value validation, FP8 quantization testing, and model initialization checks

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 28, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 28, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@pstjohn pstjohn marked this pull request as ready for review February 28, 2026 15:41
@pstjohn pstjohn force-pushed the pstjohn/qwen3 branch 3 times, most recently from 323b8a1 to 15d8949 Compare March 2, 2026 18:39
pstjohn added 2 commits March 3, 2026 10:05
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (9)
bionemo-recipes/models/qwen/Dockerfile (2)

1-5: Consider adding a non-root user for improved security.

The container runs as root by default. While this is common for development/research containers, adding a non-root user reduces the attack surface if the container is deployed in shared environments.

🔒 Proposed fix to add non-root user
 FROM nvcr.io/nvidia/pytorch:26.02-py3
 WORKDIR /workspace/bionemo
 COPY . .
 RUN --mount=type=cache,target=/root/.cache/pip \
     PIP_CONSTRAINT= pip install -r requirements.txt
+
+# Create non-root user for runtime
+RUN useradd -m -u 1000 bionemo && chown -R bionemo:bionemo /workspace/bionemo
+USER bionemo
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/Dockerfile` around lines 1 - 5, Add creation of a
non-root user and switch to it in the Dockerfile: create a user/group (e.g.,
bionemo) and set HOME, chown the WORKDIR (/workspace/bionemo) and any cache
directories (e.g., /root/.cache or change pip cache path) to that user before
switching; update RUN steps like the pip install to run as the non-root user and
add a final USER bionemo line. Ensure the USER creation and chown occur after
base image setup but before running application commands so functions
referencing WORKDIR, RUN --mount=type=cache,target=/root/.cache/pip, and COPY .
. will run under the non-root account.

3-3: Expand .dockerignore to exclude more build artifacts.

A .dockerignore exists at the repository root, but it's missing common exclusions like .git/ and __pycache__/. Since COPY . . copies the entire build context, expanding .dockerignore to include these would reduce image size and avoid unintended file inclusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/Dockerfile` at line 3, The Dockerfile uses a
broad COPY . . which pulls the entire context; update the repository
.dockerignore to exclude build artifacts and VCS files so they are not copied
into the image. Add entries such as .git/, __pycache__/, *.pyc, .venv/, venv/,
build/, dist/, .pytest_cache/, .DS_Store and node_modules/ (as applicable) to
the .dockerignore to shrink the build context and prevent unintended files from
being included.
bionemo-recipes/models/qwen/requirements.txt (1)

1-5: Consider version constraints, but note the project pattern.

This file follows the same unpinned dependency approach used across all recipe models (esm2, llama3, mixtral). While version pinning is generally a best practice for reproducibility, the project appears to rely on container image versioning instead. All models with pyproject.toml (amplify, geneformer) do pin versions, but requirements.txt files consistently remain unpinned.

If tighter reproducibility is required, consider adding a pyproject.toml for this recipe model, or specify version bounds like the other projects do in their setup files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/requirements.txt` around lines 1 - 5, The
requirements.txt for the qwen recipe lists unpinned dependencies (lm-eval,
torch, torchao, transformer_engine[pytorch], transformers) which breaks
reproducibility; fix by either adding a pyproject.toml for this recipe model
that pins or bounds versions consistent with other pinned projects (e.g., match
amplify/geneformer ranges) or add explicit version constraints/upper/lower
bounds for each package in requirements.txt (include torch, torchao,
transformer_engine[pytorch], transformers, lm-eval) to align with the project's
reproducibility policy.
bionemo-recipes/models/qwen/tests/conftest.py (1)

22-25: Replace brittle multi-level sys.path mutations with direct local imports.

The pattern of appending both parent.parent and parent directories combined with pytest_plugins = ["tests.common.fixtures"] creates collision risk in this monorepo. Since 4 model directories each have a tests/common/fixtures.py module, pytest could resolve the wrong package depending on sys.path order and import sequencing.

🔧 Suggested refactor
-sys.path.append(Path(__file__).parent.parent.as_posix())
-sys.path.append(Path(__file__).parent.as_posix())
-
-pytest_plugins = ["tests.common.fixtures"]
+tests_dir = Path(__file__).parent
+if tests_dir.as_posix() not in sys.path:
+    sys.path.insert(0, tests_dir.as_posix())
+
+pytest_plugins = ["common.fixtures"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/tests/conftest.py` around lines 22 - 25, Remove
the brittle sys.path.append manipulations in conftest.py and instead import the
local fixtures module directly to avoid cross-package resolution: delete the two
sys.path.append(...) calls, replace the string-based pytest_plugins =
["tests.common.fixtures"] with an explicit local import (e.g. from .common
import fixtures) and set pytest_plugins = [fixtures] (or import
tests.common.fixtures as fixtures if relative import is unsuitable); this keeps
the unique module object (fixtures) referenced and prevents accidental
resolution of other packages named tests.common.fixtures.
bionemo-recipes/models/qwen/tests/test_modeling_qwen3_te.py (3)

64-66: Consider using full commit SHA for reproducibility.

The short hash c1899de could potentially become ambiguous as the repository grows. Using the full 40-character SHA ensures deterministic model loading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/tests/test_modeling_qwen3_te.py` around lines 64
- 66, Replace the short commit hash in get_upstream_model_revision with the full
40-character commit SHA to ensure deterministic model loading; locate the
get_upstream_model_revision() method and change the returned string "c1899de" to
the repository's full commit SHA for that revision.

38-41: Bare module imports may fail without specific PYTHONPATH setup.

These imports assume the package root is in sys.path. Consider using explicit relative imports for maintainability and to avoid import errors in different execution contexts.

♻️ Suggested fix using relative imports
-from collator import DataCollatorWithFlattening
-from convert_qwen3 import convert_qwen3_hf_to_te, convert_qwen3_te_to_hf
-from modeling_qwen3_te import HFInferenceParams, NVQwen3Config, NVQwen3ForCausalLM
-from tests.common import BaseModelTest, TestTolerances
+from ..collator import DataCollatorWithFlattening
+from ..convert_qwen3 import convert_qwen3_hf_to_te, convert_qwen3_te_to_hf
+from ..modeling_qwen3_te import HFInferenceParams, NVQwen3Config, NVQwen3ForCausalLM
+from .common import BaseModelTest, TestTolerances
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/tests/test_modeling_qwen3_te.py` around lines 38
- 41, The test file uses bare module imports that can break when PYTHONPATH
isn't set; replace them with explicit relative imports to the package root so
the test runs reliably. Update the import statements for
DataCollatorWithFlattening, convert_qwen3_hf_to_te, convert_qwen3_te_to_hf,
HFInferenceParams, NVQwen3Config, NVQwen3ForCausalLM, BaseModelTest, and
TestTolerances to use relative import syntax (e.g., from .collator import
DataCollatorWithFlattening or from ..models.convert_qwen3 import
convert_qwen3_hf_to_te) matching the test module's package level.

134-143: Tolerance values seem reasonable but are worth documenting.

The logits absolute tolerance of 2.0 and context-parallel loss tolerances (0.5 atol, 25% rtol) are fairly permissive. If these were determined empirically for Qwen3 numerical characteristics, consider adding a brief comment explaining the rationale.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/tests/test_modeling_qwen3_te.py` around lines 134
- 143, Add a short explanatory comment above the get_tolerances method
describing why the chosen tolerances (golden_value_logits_atol=2.0 and
cp_loss_atol=0.5, cp_loss_rtol=0.25) are permissive — e.g., note they were
derived empirically from Qwen3 numerical variability or account for expected
floating-point/noise characteristics — so future readers understand the
rationale; update the docstring or add an inline comment referencing
get_tolerances and the TestTolerances fields to make the source of these
specific numeric choices clear.
bionemo-recipes/models/qwen/modeling_qwen3_te.py (2)

220-220: Replace assert with explicit exception for production code.

Using assert for input validation can be silently skipped when Python runs with optimizations (-O flag). Use an explicit ValueError or similar exception.

♻️ Suggested fix
-            assert attention_mask is not None, "Attention mask is required when packing BSHD inputs."
+            if attention_mask is None:
+                raise ValueError("Attention mask is required when packing BSHD inputs.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/modeling_qwen3_te.py` at line 220, Replace the
runtime-unsafe assertion with an explicit exception: instead of using assert
attention_mask is not None in the function/method that validates BSHD inputs
(the code referencing attention_mask), raise a ValueError (or appropriate
exception) with the same message ("Attention mask is required when packing BSHD
inputs.") so the check cannot be skipped under Python -O and will always enforce
the required input.

380-380: Global dynamo config mutation at module import time.

Setting torch._dynamo.config.capture_scalar_outputs = True at module level affects the entire process. This could cause unexpected behavior in other parts of the codebase that rely on different dynamo settings.

Consider documenting this side effect prominently or scoping it more narrowly if possible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/qwen/modeling_qwen3_te.py` at line 380, The module
currently mutates global Dynamo state by setting
torch._dynamo.config.capture_scalar_outputs = True at import time; instead,
remove the top-level assignment and scope the change only where needed (e.g.,
around the code path that invokes compilation/tracing). Replace the global set
with a scoped approach such as using
torch._dynamo.config.patch(capture_scalar_outputs=True) (or save the original
value and restore it around the critical function) in the function that performs
tracing/compilation in modeling_qwen3_te.py so other modules are not affected by
a global config change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/models/qwen/collator.py`:
- Around line 975-977: The current calculation of slice_sizes uses floor
division (slice_sizes = (cu_seqlens_padded[1:] - cu_seqlens_padded[:-1]) //
total_slices_of_any_sequence) which silently drops tokens when sequence lengths
are not divisible by total_slices_of_any_sequence (2 * cp_world_size); update
the logic around total_slices_of_any_sequence / slice_sizes / cu_seqlens_padded
to handle remainders instead of truncating — compute the base size and remainder
for each sequence (e.g., base = length // total_slices_of_any_sequence; rem =
length % total_slices_of_any_sequence) and distribute the rem extra tokens
across the first rem slices (or use ceil semantics and adjust offsets) so that
sum(slice_sizes) == original sequence length for every entry; ensure any
downstream code that consumes slice_sizes (referenced by
total_slices_of_any_sequence, slice_sizes, cu_seqlens_padded) uses these
adjusted slice sizes.
- Around line 435-437: The code sets max_length using the full unsharded
sequence length (max_length = batch["input_ids"].shape[1]) in the elif
self.qkv_format == "bshd" branch, which overstates per-CP-shard
max_length_q/max_length_k; change this to use the shard-local sequence length
instead: compute a shard_seq_len (either from an already-sharded tensor in
batch, e.g. a per-shard slice like batch["input_ids_shard"] if available, or by
dividing batch["input_ids"].shape[1] by self.cp_shards) and assign max_length =
shard_seq_len (and use that for max_length_q/max_length_k calculations) so each
CP shard uses its local sequence length in collator.py where qkv_format ==
"bshd".
- Around line 1015-1017: The function _scatter_batch_to_cp_tp_ranks currently
forbids None for all_batches but callsites sometimes pass None (e.g., where
combined_batch = ... if self.cp_tp_rank == 0 else None), so update the type
annotations to accept None and reflect that None can be returned: change the
parameter type to allow None (e.g., list[BatchType] | list[StopIteration] | None
or Optional[list[...]] | Optional[list[StopIteration]]) and adjust the return
type to BatchType | StopIteration | None; update any related type hints/imports
so Pyright no longer flags the callsite passing None.

In `@bionemo-recipes/models/qwen/convert_qwen2.py`:
- Around line 244-248: The TE→HF conversion currently always calls
output_model.tie_weights() and unconditionally uses model_hf._tied_weights_keys
for state_dict_ignored_entries, which breaks roundtrips when tie_word_embeddings
is False; update the TE→HF code around output_model.tie_weights() and the
state_dict_ignored_entries assignment to check
model_hf.config.tie_word_embeddings and only call output_model.tie_weights() and
set state_dict_ignored_entries = model_hf._tied_weights_keys when that flag is
True, otherwise leave state_dict_ignored_entries as an empty list and skip tying
weights so the behavior matches the HF→TE branch.
- Around line 75-87: The current code builds index tensors on CPU and then calls
.cpu() on the resulting bias slices (q_bias/k_bias/v_bias), which forces CUDA
qkv_bias onto CPU; instead, construct q_slice, k_slice, v_slice on the same
device as qkv_bias (e.g., torch.arange(..., device=qkv_bias.device)) and remove
the trailing .cpu() calls so q_bias, k_bias, v_bias remain on qkv_bias.device;
update the lines that create q_slice/k_slice/v_slice and the assignments to
q_bias/k_bias/v_bias in convert_qwen2.py accordingly (symbols: q_slice, k_slice,
v_slice, qkv_bias, q_bias, k_bias, v_bias, heads_per_group, num_query_groups,
qkv_total_dim).

In `@bionemo-recipes/models/qwen/export.py`:
- Line 56: The copy uses a relative path, so make it robust by resolving the
source file relative to this module: compute src = Path(__file__).parent /
"modeling_qwen3_te.py" (import Path from pathlib if needed) and pass that src to
shutil.copy(src, export_path / "modeling_qwen3_te.py"); update the copy call in
export.py to use this resolved Path instead of the relative string.

In `@bionemo-recipes/models/qwen/modeling_qwen2_te.py`:
- Around line 208-216: The code assumes input_ids exists when computing
padded_seq_len and when handling packing/unpacking; guard those accesses in the
should_pack_inputs path (and the other similar block around the
_unpad_input/_pad_input calls). Specifically, in the forward path where
should_pack_inputs is True and where you later reference input_ids.size(1),
replace direct input_ids usage with a conditional: if input_ids is not None use
input_ids.size(1) otherwise derive the sequence length from
inputs_embeds/hidden_states (e.g., hidden_states.size(1)); apply the same guard
for the other block around the _pad_input/_unpad_input calls so
input-only-embeds inputs do not raise AttributeError. Ensure you adjust variable
name padded_seq_len and any logic that uses it (e.g., the call sites of
_unpad_input/_pad_input) to use the guarded value.

In `@bionemo-recipes/models/qwen/modeling_qwen3_te.py`:
- Around line 141-143: The code uses config.layer_types[layer_idx] without
guarding for missing layer_types or a None sliding_window; update the
conditional that sets window_size (the expression around
window_size=(config.sliding_window, config.sliding_window) if
config.layer_types[layer_idx] == "sliding_attention" else None) to first verify
that config.layer_types exists and has an entry at layer_idx (e.g.,
hasattr/config.layer_types is not None and len(config.layer_types) > layer_idx)
and also check config.sliding_window is not None before treating the layer as
"sliding_attention"; if any check fails, set window_size to None.

In `@bionemo-recipes/models/qwen/state.py`:
- Around line 167-181: The code currently prints and continues when a target
parameter name is missing from target_state, which can mask unpopulated weights;
in the loop over target.named_parameters() (using variables name, param and the
target_state dict), replace the print(...) branch with a hard failure: raise a
descriptive ValueError (or RuntimeError) that includes the missing parameter
name and possibly remaining keys in target_state so conversion coverage is
clear, ensuring that _params is only populated when nn.Parameter(target_param,
requires_grad=param.requires_grad) is created and preventing silent
continuation.
- Around line 437-442: When there are no wildcard_matches the code uses
np.empty(shape, dtype=object) which leaves slots uninitialized; replace that
with a deterministic initialization (e.g. use np.full(shape, [], dtype=object)
or create an array populated with an explicit empty value) so the
`_match_keys`/output_array is reproducibly an empty container for no-match
cases; update the block that checks len(wildcard_matches) == 0 (variables:
wildcard_matches, shape, output_array, np.empty) to allocate and fill the array
deterministically.

In `@bionemo-recipes/models/qwen/tests/common/fixtures.py`:
- Around line 40-52: The fixture use_te_debug currently unconditionally pops
NVTE_DEBUG on teardown and thus can erase pre-existing environment values;
update use_te_debug (and any other TE-related fixtures that set env vars) to
save the original value (orig = os.environ.get("NVTE_DEBUG")) before setting it,
then on teardown restore it by setting it back if orig is not None or deleting
it only if orig was None; apply the same pattern for other fixtures in this file
that modify env vars so tests leave the environment exactly as they found it.

In `@bionemo-recipes/models/qwen/tests/test_modeling_qwen2_te.py`:
- Around line 68-73: The tokenizer is being loaded without the pinned revision,
causing a mismatch with the pinned model; update get_tokenizer to call
AutoTokenizer.from_pretrained with the same pinned revision by passing
revision=self.get_upstream_model_revision() (keep the existing pad_token
fallback logic), referencing the get_tokenizer method and
get_upstream_model_revision helper so the tokenizer and model use identical
revisions for reproducible tests.

In `@ci/scripts/check_copied_files.py`:
- Line 41: Update the destination paths in the list inside check_copied_files.py
that currently point to "bionemo-recipes/models/qwen3/..." (example entry
"bionemo-recipes/models/qwen3/collator.py") to the correct
"bionemo-recipes/models/qwen/..." paths (also fix the other two occurrences at
the other entries). This will ensure the validation list of expected
destinations (the list containing "bionemo-recipes/models/qwen3/collator.py")
matches the actual added files and prevents the ValueError raised by the
copied-files check.

---

Nitpick comments:
In `@bionemo-recipes/models/qwen/Dockerfile`:
- Around line 1-5: Add creation of a non-root user and switch to it in the
Dockerfile: create a user/group (e.g., bionemo) and set HOME, chown the WORKDIR
(/workspace/bionemo) and any cache directories (e.g., /root/.cache or change pip
cache path) to that user before switching; update RUN steps like the pip install
to run as the non-root user and add a final USER bionemo line. Ensure the USER
creation and chown occur after base image setup but before running application
commands so functions referencing WORKDIR, RUN
--mount=type=cache,target=/root/.cache/pip, and COPY . . will run under the
non-root account.
- Line 3: The Dockerfile uses a broad COPY . . which pulls the entire context;
update the repository .dockerignore to exclude build artifacts and VCS files so
they are not copied into the image. Add entries such as .git/, __pycache__/,
*.pyc, .venv/, venv/, build/, dist/, .pytest_cache/, .DS_Store and node_modules/
(as applicable) to the .dockerignore to shrink the build context and prevent
unintended files from being included.

In `@bionemo-recipes/models/qwen/modeling_qwen3_te.py`:
- Line 220: Replace the runtime-unsafe assertion with an explicit exception:
instead of using assert attention_mask is not None in the function/method that
validates BSHD inputs (the code referencing attention_mask), raise a ValueError
(or appropriate exception) with the same message ("Attention mask is required
when packing BSHD inputs.") so the check cannot be skipped under Python -O and
will always enforce the required input.
- Line 380: The module currently mutates global Dynamo state by setting
torch._dynamo.config.capture_scalar_outputs = True at import time; instead,
remove the top-level assignment and scope the change only where needed (e.g.,
around the code path that invokes compilation/tracing). Replace the global set
with a scoped approach such as using
torch._dynamo.config.patch(capture_scalar_outputs=True) (or save the original
value and restore it around the critical function) in the function that performs
tracing/compilation in modeling_qwen3_te.py so other modules are not affected by
a global config change.

In `@bionemo-recipes/models/qwen/requirements.txt`:
- Around line 1-5: The requirements.txt for the qwen recipe lists unpinned
dependencies (lm-eval, torch, torchao, transformer_engine[pytorch],
transformers) which breaks reproducibility; fix by either adding a
pyproject.toml for this recipe model that pins or bounds versions consistent
with other pinned projects (e.g., match amplify/geneformer ranges) or add
explicit version constraints/upper/lower bounds for each package in
requirements.txt (include torch, torchao, transformer_engine[pytorch],
transformers, lm-eval) to align with the project's reproducibility policy.

In `@bionemo-recipes/models/qwen/tests/conftest.py`:
- Around line 22-25: Remove the brittle sys.path.append manipulations in
conftest.py and instead import the local fixtures module directly to avoid
cross-package resolution: delete the two sys.path.append(...) calls, replace the
string-based pytest_plugins = ["tests.common.fixtures"] with an explicit local
import (e.g. from .common import fixtures) and set pytest_plugins = [fixtures]
(or import tests.common.fixtures as fixtures if relative import is unsuitable);
this keeps the unique module object (fixtures) referenced and prevents
accidental resolution of other packages named tests.common.fixtures.

In `@bionemo-recipes/models/qwen/tests/test_modeling_qwen3_te.py`:
- Around line 64-66: Replace the short commit hash in
get_upstream_model_revision with the full 40-character commit SHA to ensure
deterministic model loading; locate the get_upstream_model_revision() method and
change the returned string "c1899de" to the repository's full commit SHA for
that revision.
- Around line 38-41: The test file uses bare module imports that can break when
PYTHONPATH isn't set; replace them with explicit relative imports to the package
root so the test runs reliably. Update the import statements for
DataCollatorWithFlattening, convert_qwen3_hf_to_te, convert_qwen3_te_to_hf,
HFInferenceParams, NVQwen3Config, NVQwen3ForCausalLM, BaseModelTest, and
TestTolerances to use relative import syntax (e.g., from .collator import
DataCollatorWithFlattening or from ..models.convert_qwen3 import
convert_qwen3_hf_to_te) matching the test module's package level.
- Around line 134-143: Add a short explanatory comment above the get_tolerances
method describing why the chosen tolerances (golden_value_logits_atol=2.0 and
cp_loss_atol=0.5, cp_loss_rtol=0.25) are permissive — e.g., note they were
derived empirically from Qwen3 numerical variability or account for expected
floating-point/noise characteristics — so future readers understand the
rationale; update the docstring or add an inline comment referencing
get_tolerances and the TestTolerances fields to make the source of these
specific numeric choices clear.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30baad7 and b597c6b.

📒 Files selected for processing (20)
  • bionemo-recipes/models/qwen/.ruff.toml
  • bionemo-recipes/models/qwen/Dockerfile
  • bionemo-recipes/models/qwen/README.md
  • bionemo-recipes/models/qwen/collator.py
  • bionemo-recipes/models/qwen/convert_qwen2.py
  • bionemo-recipes/models/qwen/convert_qwen3.py
  • bionemo-recipes/models/qwen/export.py
  • bionemo-recipes/models/qwen/modeling_qwen2_te.py
  • bionemo-recipes/models/qwen/modeling_qwen3_te.py
  • bionemo-recipes/models/qwen/requirements.txt
  • bionemo-recipes/models/qwen/state.py
  • bionemo-recipes/models/qwen/tests/__init__.py
  • bionemo-recipes/models/qwen/tests/common/README.md
  • bionemo-recipes/models/qwen/tests/common/__init__.py
  • bionemo-recipes/models/qwen/tests/common/fixtures.py
  • bionemo-recipes/models/qwen/tests/common/test_modeling_common.py
  • bionemo-recipes/models/qwen/tests/conftest.py
  • bionemo-recipes/models/qwen/tests/test_modeling_qwen2_te.py
  • bionemo-recipes/models/qwen/tests/test_modeling_qwen3_te.py
  • ci/scripts/check_copied_files.py

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@pstjohn
Copy link
Collaborator Author

pstjohn commented Mar 3, 2026

@coderabbitai resolve

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

✅ Actions performed

Comments resolved. Auto-approval is disabled; enable reviews.request_changes_workflow to approve automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant